Part II Gaussian random fields and Kriging
The purpose of this computer exercise is to give an introduction to parameter estima- tion and kriging for Gaussian random field models for spatial data.
We assume the following observation model on the unit square:
\[\begin{equation*}
y(\boldsymbol{s}_j) = x(\boldsymbol{s}_j) + \epsilon_j, \ \ \ j = 1, \dots, N,
\end{equation*}\]
where \(\epsilon_j ~ \mathop{N}(0, \tau^2)\) are independent measurement noise terms. Further, consider a Matérn covariance function for the Gaussian random field \(\boldsymbol{x(s)}\):
\[\begin{equation*}
Cov(x(\boldsymbol{s}_i), x(\boldsymbol{s}_j)) = \Sigma_{i, j} = \sigma^2(1 + \phi h)\exp(-\phi h),
\end{equation*}\]
where \(h\) denotes the Euclidean distance between the two sites \(\boldsymbol{s}_i\) and \(\boldsymbol{s}_j\).
We assume the mean increases with east and north coordinates as follows: \(\mu_j = \alpha((s_{j1} - 0.5) + (s_{j2} - 0.5))\), for site \(\boldsymbol{s}_j = (s_{j1}, s_{j2})\) on the unit square.
2.2 Paramter estimation
We will now use the simulated data to estimate the model parameters \(\alpha, \sigma^2, \tau^2, \phi\) using maximum likelihood estimation. Iterate between the update for the mean parameter, and updating the covariance parameters. Monitor the likelihood function at each step of the algorithm to check convergence.
The mean of the field is modelled by \(p(\boldsymbol{x})\) and the imperfect information \(\boldsymbol{y} = (y_1, \dots, y_m)\) conditional on \(\boldsymbol{x}\) can be modelled by \(p(\boldsymbol{y}|\boldsymbol{x})\), which can be expressed as follows: \[\begin{equation*}
p(\boldsymbol{x}) = \mathop{N}(\boldsymbol{H}\beta, \boldsymbol{\Sigma}), \quad p(\boldsymbol{y}|\boldsymbol{x}) = \mathop{N}(\boldsymbol{Fx}, \boldsymbol{T})
\end{equation*}\] Therefore, the marginal likelihood of the data is \[\begin{equation*}
p(\boldsymbol{y}) = \mathop{N}(\boldsymbol{G\beta}, \boldsymbol{C}), \quad \boldsymbol{G} = \boldsymbol{FH}, \quad \boldsymbol{C} = \boldsymbol{F\Sigma F^T} + \boldsymbol{T}
\end{equation*}\] The log-likelihood as a function of \(\beta\) and unknown fixed nuisance parameters \(\boldsymbol{\theta}\) in the prior covariance matrix \(\boldsymbol{\Sigma} = \boldsymbol{\Sigma(\boldsymbol{\theta})}\), and/or the likelihood noise matrix \(\boldsymbol{T} = \boldsymbol{T}(\boldsymbol{\theta})\) becomes \[\begin{equation*}
l(\boldsymbol{\theta}, \boldsymbol{\beta}) = -\frac{m}{2} \log(2\pi) - \frac{1}{2}\log|\boldsymbol{C}| - \frac{1}{2}(\boldsymbol{y} - \boldsymbol{G}\boldsymbol{\beta})^T\boldsymbol{C}^{-1}(\boldsymbol{y} - \boldsymbol{G\beta})
\end{equation*}\]
The MLEs of \(\boldsymbol{\beta}\) and \(\boldsymbol{\theta}\) are obtained by \[\begin{equation*}
(\hat{\boldsymbol{\beta}}, \hat{\boldsymbol{\theta}}) = \underset{\boldsymbol{\beta}, \boldsymbol{\theta}}{\arg\max}\{l(\boldsymbol{\beta, \theta})\}
\end{equation*}\] For fixed \(\boldsymbol{\theta}\), the MLEs of \(\boldsymbol{\beta}\) can be determined analytically. \[\begin{equation*}
\frac{dl}{d\boldsymbol{\beta}} = \boldsymbol{G}^T\boldsymbol{C}^{-1}\boldsymbol{y} - \boldsymbol{G}^T\boldsymbol{C}^{-1}\boldsymbol{G}\boldsymbol{\beta} = \boldsymbol{0}, \quad \hat{\boldsymbol{\beta}} = (\boldsymbol{G}^T\boldsymbol{C}^{-1}\boldsymbol{G})^{-1}\boldsymbol{G}^T\boldsymbol{C}^{-1}\boldsymbol{y}
\end{equation*}\] Whereas for fixed \(\boldsymbol{\beta}\), the MLE of nuisance parameters \(\boldsymbol{\theta}\) can be obtained by numerical maximization. Let \(\boldsymbol{z} = \boldsymbol{y} - \boldsymbol{G\beta}\), and \(\boldsymbol{Q} = \boldsymbol{C}^{-1}\). For each component of \(\boldsymbol{\theta}_r\), \(r = 1, \dots, d\), in this case, \(\theta\) has 3 components (\(\sigma^2, \eta, \tau^2\)). The score of the log-likelihood becomes \[\begin{equation*}
\frac{dl}{d\theta_r} = -\frac{1}{2}\text{trace}(\boldsymbol{Q}\frac{d\boldsymbol{C}}{d\theta_r}) + \frac{1}{2}\boldsymbol{z}^T\boldsymbol{Q}\frac{d\boldsymbol{C}}{d\theta_r}\boldsymbol{Qz}
\end{equation*}\]
The above mentioned score can be solved iteratively using Fisher scoring algorithm. To achieve the numerical stability, the expected Hessian is applied, which is \[\begin{equation*}
\mathop{E}(\frac{d^2l}{d\theta_rd\theta_{\bar{r}}}) = -\frac{1}{2}\text{trace}(\boldsymbol{Q}\frac{d\boldsymbol{C}}{d\theta_{\bar{r}}}\boldsymbol{Q}\frac{d\boldsymbol{C}}{d\theta_r})
\end{equation*}\]
## [1] "1.16100274094671 , iter no is 0"
## [1] "0.0983478023616246 , iter no is 1"
## [1] "0.0637476756371519 , iter no is 2"
## [1] "0.0330323876752982 , iter no is 3"
## [1] "0.0216765191888463 , iter no is 4"

##
## Estimated sigma is 0.785 ; True sigma is 1
## Estimated phi is 12.72 ; True phi is 10
## Estimated tau is -0.06224 ; True tau is 0.05
## Estimated beta1 is -1.47 ; True beta1 is -1
## Estimated beta2 is 1.63 ; True beta2 is 1
## Estimated beta3 is 1.88 ; True beta3 is 1NULL
2.3 Kriging
We will now use the estimated model parameters to perform kriging prediction. Predict variables \(x(s)\), where predictions sites lie on a regular grid of size 25x25 for the unit square. Visualize the Kriging surface and the prediction standard error. Compare with the true field.


## [1] "The prediction error is 0.303326130373502"
Part III Integrated nested Laplace Approximations (INLA)
3.1 Simple Linear Regression
In the last part of this exercise we explore the R-INLA package.
The ski jumping data set contains 26 observations of measured lengths in ski jumping competitions (in meters) between 1961 and 2011.
In Figure we observe a clear (almost linear) trend in the measured jumping lengths to increase with the years.
Therefore, we use linear approach for the statistical modelling of this data, where the years \(x_i\) are the covariates and the lengths \(y_i\) are the responses for \(i=1,\dots,26\): \[\begin{equation*}
\mathbb{E}[y_i] = \mu + \beta x_i,\quad \mathbb{Var}[y_i]=\tau^{-1}.
\end{equation*}\]
This can be posed as a latent Gaussian model suited for the INLA framework.
- The response depends on the linear predictors \(\eta\) as \(y|x,\theta = \Pi \pi(y_i|\eta_i,\tau)\) where \(\pi(y_i|\eta_i,\tau)\sim\mathcal{N}(\eta_i,\tau^{-2})\)
- The parameter of the linear predictor \(eta_i=\mu+x_i\beta\) are independent Gaussian with some variance and mean zero.
- The hyperparameter \(\tau\) is only one-dimensional.
##
## Call:
## "inla(formula = Length ~ Year, data = skiData)"
## Time used:
## Pre = 4.73, Running = 0.494, Post = 0.259, Total = 5.48
## Fixed effects:
## mean sd 0.025quant 0.5quant 0.975quant mode kld
## (Intercept) -4029.646 106.615 -4240.649 -4029.650 -3818.934 -4029.646 0
## Year 2.126 0.054 2.019 2.126 2.232 2.126 0
##
## Model hyperparameters:
## mean sd 0.025quant 0.5quant
## Precision for the Gaussian observations 0.072 0.02 0.038 0.07
## 0.975quant mode
## Precision for the Gaussian observations 0.116 0.066
##
## Expected number of effective parameters(stdev): 2.00(0.00)
## Number of equivalent replicates : 13.00
##
## Marginal log-Likelihood: -89.58
## Mean 3.83993
## Stdev 0.552976
## Quantile 0.025 2.93482
## Quantile 0.25 3.44617
## Quantile 0.5 3.77761
## Quantile 0.75 4.16357
## Quantile 0.975 5.09958
FB: What does all this mean?…..
3.2 GLMM with random effects
Last, we use INLA to analyse the “Seeds” data set.
##
## Call:
## c("inla(formula = formula, family = \"binomial\", data = data, Ntrials
## = n, ", " control.predictor = list(compute = TRUE), control.family =
## list(link = \"logit\"))" )
## Time used:
## Pre = 5.06, Running = 0.456, Post = 0.251, Total = 5.77
## Fixed effects:
## mean sd 0.025quant 0.5quant 0.975quant mode kld
## (Intercept) -0.429 0.115 -0.656 -0.429 -0.204 -0.428 0
## x1 -0.272 0.156 -0.580 -0.272 0.033 -0.271 0
## x2 1.066 0.146 0.782 1.066 1.353 1.065 0
##
## Random effects:
## Name Model
## plate IID model
##
## Model hyperparameters:
## mean sd 0.025quant 0.5quant 0.975quant mode
## Precision for plate 18930.69 20071.40 30.63 12701.76 73765.97 7.83
##
## Expected number of effective parameters(stdev): 3.25(1.16)
## Number of equivalent replicates : 6.46
##
## Marginal log-Likelihood: -73.75
## Posterior marginals for the linear predictor and
## the fitted values are computed


---
title: "Project2"
subtitle: "Gaussian random field with application of INLA"
author: $\overset{\mathrm{yaolin.ge@ntnu.no}}{10026}$ \and
        $\overset{\mathrm{florian.beiser@ntnu.no}}{10026}$
output:
  # pdf_document:
  #   toc: true
  #   toc_depth: 3
  #   number_sections: true
  html_document:
    toc: yes
    toc_float: yes
    code_download: yes
    toc_depth: 3
  ioslides_presentation: default
  beamer_presentation:
    slide_level: 1
    keep_tex: yes
  pdf_document:
    toc: yes
    toc_depth: 2
---

```{r setup, include=TRUE,echo=FALSE}
suppressPackageStartupMessages(library(knitr))
knitr::opts_chunk$set(echo = FALSE, message=FALSE,warning = FALSE, error = FALSE)
```

```{r load libraries, echo = F}
library(lattice)
library(viridisLite)
coul <- viridis(100)
```

# Part I Multivariate normal distribution
Let $\boldsymbol{x} = (x_1, \dots, x_n), n = 100$ be multivariate normal distributed with $E(x_i) = 0, Var(x_i) = 1$, and $Corr(x_i, x_j) = e^{-0.1|i - j|}$

a) Compute and image the covariance matrix $\boldsymbol{\Sigma}$ of $\boldsymbol{x}$

b) Find the lower Cholesky factor $\boldsymbol{L}$, such that $\boldsymbol{L}\boldsymbol{L}^T = \boldsymbol{\Sigma}$, of this covariance matrix, and image.

c) Sample $\boldsymbol{x} = \boldsymbol{Lz}$, where $\boldsymbol{z}$ is a length n random vector of independent standard normal variables. Plot the sample.

d) Find the precision matrix $\boldsymbol{Q}$ of the covariance matrix, and compute the lower Cholesky factor $\boldsymbol{L}_Q$, such that $\boldsymbol{L}_Q\boldsymbol{L}_Q^T = \boldsymbol{Q}$, of this matrix. Image these matrices and compare them to the images obtained in a) and b)

e) Sample $\boldsymbol{x}$ by solving $\boldsymbol{L}_Q^T\boldsymbol{x} = \boldsymbol{z}$, where $\boldsymbol{z}$ is a length n random vector of independen standard normal variables. Plot the sample.

f) Permute the ordering of variables in $\boldsymbol{x}$, and redo the exercises. 


***

## Solution to Part I

### a) 
Given that $\boldsymbol{\Sigma} = e^{-0.1|i - j|}$. The covariance matrix can be expressed as follows:
\begin{equation*}
    \Sigma = \begin{pmatrix} 
    1 &  e^{- 0.1 h_{12}} &\dots & e^{- 0.1 h_{1n}} \\
    e^{- 0.1 h_{21}} & 1 & \dots & e^{- 0.1 h_{2n}} \\
    \vdots & \vdots & \ddots & \vdots \\
    e^{- 0.1 h_{n1}} & e^{- 0.1 h_{n2}} & \dots & 1
    \end{pmatrix}
\end{equation*}

```{r covariance matrix, echo = F, fig.align='center'}
# sizes
n <- 100

# define regular grid of locations
sites1v <- array((1:n),c(n,1))

# Prior mean
m <- 0
# compute East and North distances on grid
ww <- array(1,c(n,1))

# determine the distance matrix
H <- abs(sites1v%*%t(ww)-ww %*% t(sites1v))

# Exponential covariance model
Sigma <- exp(-0.1*H)

# Plot the covariance matrix
levelplot(t(Sigma), 
          col.regions = coul, main = "Covariance matrix",
          ylim=c(100,1))
```
### b) 
According to the cholesky decomposition rule, $\boldsymbol{L}$ is the lower triangular matrix for $\boldsymbol{\Sigma}$, it can be easily computed from R using `L = chol(Sigma)`. It is then plotted as below. 

```{r Cholesky, echo = F, fig.align='center'}
# Using R function to calculate Cholesky decomposition
L <- chol(Sigma)
levelplot(t(L), ylim=c(100,1), col.regions = coul, main = "Lower triangular matrix")

# FB: Attention!! chol() gives what is L^t in our exercise sheet, i.e. L=chol(Sigma) is an upper but NOT lower matrix!
```


### c) 
Sample using $\boldsymbol{x} = \boldsymbol{L}\boldsymbol{z}$ transforms the zero-mean, standard normal random variales to the random variables with the desired covariance matrix. 

```{r, random samples, fig.align='center'}
z = rnorm(100)
x = L %*% z
plot(x, main = "Random samples given the covariance")
```


### d) 
The precision matrix $\boldsymbol{Q}$ is the inverse of the covariance matrix $\boldsymbol{\Sigma}$, it is computed using `Q = solve(Sigma)` in R. The three matrices are thereby depicted as follows. Since the covariance matrix is not singular, given that the it belongs to the Matern family, thus it is analytically guaranteed to have positive definite property. Therefore, both precision matrix and the lower triangular precision matrix exist. 

```{r comp between three matrices, fig.show = "hold", out.width="50%"}
Q <- solve(Sigma)
LQ <- chol(Q)
par(mar = c(4,4,.1, .1))
# par(mfrow=c(1,3), mar=c(4,4,4,1), oma=c(0.5,0.5,0.5,0))
levelplot(Sigma, col.regions = coul, main = "Covariance matrix")
levelplot(L, col.regions = coul, main = "Lower triangular covariance matrix")
levelplot(Q, col.regions = coul, main = "Precision matrix")
levelplot(LQ, col.regions = coul, main = "Lower triangular precision matrix")
# levelplot(LQ, col.regions = coul, main = "Lower triangular matrix")
```


### e) 
Similarly, the expected random samples can be generated using the inversion of the above formula, thus $\boldsymbol{L}_Q^T\boldsymbol{x} = \boldsymbol{z}$

```{r, sample in inversion way, fig.align='center'}
z = rnorm(100)
x = solve(t(LQ), z)
plot(x, main = "Random samples using inversion rule")
```

### f) 
Permute $\boldsymbol{x}$ to make randomise the ordering of the grid, the asscociated covariance matrix can be thereby modified in a sparse way. 

```{r, permutation, fig.align='center'}
n_per = 3
Sigma_per <- list()
L_per <- list()
Q_per <- list()
LQ_per <- list()
for (i in c(1:n_per)){
  sites1v_per <- array(sample(sites1v, size = n, replace = FALSE), c(n,1))
  # determine the distance matrix
  H_per <- abs(sites1v_per%*%t(ww)-ww %*% t(sites1v_per))
  # Exponential covariance model
  Sigma_temp <- exp(-0.1*H_per)
  Sigma_per[[i]] <- Sigma_temp
  # L matrix
  L_temp <- chol(Sigma_temp)
  L_per[[i]] <- L_temp
  
  z = rnorm(100)
  x_per = L_temp %*% z
  plot(x_per, main = paste(i, "Permuted random samples given the covariance"))
  
  Q_temp <- solve(Sigma_temp)
  LQ_temp <- chol(Q_temp)
  Q_per[[i]] <- Q_temp
  LQ_per[[i]] <- LQ_temp
  z = rnorm(100)
  x_per = solve(t(LQ_temp), z)
  plot(x_per, main = paste(i, "Permuted random samples using inversion rule"))
}


```

```{r, cont plotting 1, fig.show = "hold", out.width="50%"}
i = 1
par(mar = c(4, 4,.1, .1))
# par(mfrow=c(1,3), mar=c(4,4,4,1), oma=c(0.5,0.5,0.5,0))
levelplot(Sigma_per[[i]], col.regions = coul, main = paste(i, "Permuted covariance matrix"))
levelplot(L_per[[i]], col.regions = coul, main = paste(i,"Permuted lower triangular covariance matrix"))
levelplot(Q_per[[i]], col.regions = coul, main = paste(i,"Permuted precision matrix"))
levelplot(LQ_per[[i]], col.regions = coul, main = paste(i,"Permuted lower triangular precision matrix"))
```

```{r, cont plotting 2, fig.show = "hold", out.width="50%"}
i = 2
par(mar = c(4, 4,.1, .1))
# par(mfrow=c(1,3), mar=c(4,4,4,1), oma=c(0.5,0.5,0.5,0))
levelplot(Sigma_per[[i]], col.regions = coul, main = paste(i, "Permuted covariance matrix"))
levelplot(L_per[[i]], col.regions = coul, main = paste(i,"Permuted lower triangular covariance matrix"))
levelplot(Q_per[[i]], col.regions = coul, main = paste(i,"Permuted precision matrix"))
levelplot(LQ_per[[i]], col.regions = coul, main = paste(i,"Permuted lower triangular precision matrix"))
```

```{r, cont plotting 3, fig.show = "hold", out.width="50%"}
i = 3
par(mar = c(4, 4,.1, .1))
# par(mfrow=c(1,3), mar=c(4,4,4,1), oma=c(0.5,0.5,0.5,0))
levelplot(Sigma_per[[i]], col.regions = coul, main = paste(i, "Permuted covariance matrix"))
levelplot(L_per[[i]], col.regions = coul, main = paste(i,"Permuted lower triangular covariance matrix"))
levelplot(Q_per[[i]], col.regions = coul, main = paste(i,"Permuted precision matrix"))
levelplot(LQ_per[[i]], col.regions = coul, main = paste(i,"Permuted lower triangular precision matrix"))
```



# Part II Gaussian random fields and Kriging

The purpose of this computer exercise is to give an introduction to parameter estima- tion and kriging for Gaussian random field models for spatial data.

We assume the following observation model on the unit square:

\begin{equation*}
    y(\boldsymbol{s}_j) = x(\boldsymbol{s}_j) + \epsilon_j, \ \ \  j = 1, \dots, N,
\end{equation*}

where $\epsilon_j ~ \mathop{N}(0, \tau^2)$ are independent measurement noise terms. Further, consider a Matérn covariance function for the Gaussian random field $\boldsymbol{x(s)}$:

\begin{equation*}
    Cov(x(\boldsymbol{s}_i), x(\boldsymbol{s}_j)) = \Sigma_{i, j} = \sigma^2(1 + \phi h)\exp(-\phi h),
\end{equation*}

where $h$ denotes the Euclidean distance between the two sites $\boldsymbol{s}_i$ and $\boldsymbol{s}_j$.

We assume the mean increases with east and north coordinates as follows: $\mu_j = \alpha((s_{j1} - 0.5) + (s_{j2} - 0.5))$, for site $\boldsymbol{s}_j = (s_{j1}, s_{j2})$ on the unit square. 

*** 

## 2.1 Simulation

Simulate $N = 200$ random sites in the unit square and plot them. Form the covariance matrix using $\sigma = 1, \phi = 10, \tau = 0.05$. Take its Cholesky decomposition and simulate dependent zero-mean Gaussian data variables, then add the mean using $\alpha = 1$. Plot your observations.


The true mean of the field is expressed as 
    \begin{equation*}
        \mu_i = \alpha((s_{i1} - 0.5) + (s_{i2} - 0.5)) = -\alpha + \alpha s_{i1} + \alpha s_{i2} = \underbrace{
        \begin{bmatrix}
        1 & s_{i1} & s_{i2} \\
        \end{bmatrix}}_{\boldsymbol{h}^T(\boldsymbol{s}_i)}
        \underbrace{
        \begin{bmatrix}
        -\alpha \\
        \alpha \\
        \alpha 
        \end{bmatrix}}_{\boldsymbol{\beta}}
    \end{equation*}
    where $s_{i1}, s_{i2}$ are the location from east and north direction in the grid. 

```{r GP simulation}

Matern_cov <- function(sigma, phi, t){
    # param sigma: scaling coef
    # param eta: range coef
    # param t: distance matrix
    # return: matern covariance
    return(sigma ^ 2 * (1 + phi * t) * exp(-phi * t))
}

plotf <- function(v, string){
  vv <- v
  dim(vv) <- c(n1, n2)
  levelplot(vv, col.regions = coul, main = string)
}

# Setup the grid
n1 = 25 # number of grid points along east direction
n2 = 25 # number of grid points along north direction
n = n1 * n2 # total number of grid points

dn1 = 1/n1
dn2 = 1/n2
sites1 = array(seq(0, 1, dn1), c(n1, 1))
sites2 = array(seq(0, 1, dn2), c(n2, 1))
# sites1 = array(seq(1, n1), c(n1, 1))
# sites2 = array(seq(1, n2), c(n2, 1))
ww1 = rep(1, n1)
ww2 = rep(1, n2)
sites1m = sites1 %*% t(ww1) # sites1m is the matrix version of sites1
sites2m = ww2 %*% t(sites2)

sites1v = matrix(sites1m, nrow = n, ncol = 1)
sites2v = matrix(sites2m, nrow = n, ncol = 1)

plot(sites1v, sites2v)

# Compute the distance matrix
ddE = sites1v %*% matrix(rep(1, n), nrow = 1, ncol = n) - matrix(rep(1, n), nrow = n, ncol = 1) %*% t(sites1v)
dd2E = ddE * ddE
ddN = sites2v %*% matrix(rep(1, n), nrow = 1, ncol = n) - matrix(rep(1, n), nrow = n, ncol = 1) %*% t(sites2v)
dd2N = ddN * ddN
t = sqrt(dd2E + dd2N)
levelplot(t, col.regions = coul, main = "Distance matrix")

# Simulate the initial random field
alpha = 1.0 # beta as in regression model
sigma = 1.0  # scaling coef in matern kernel
phi = 10 # range coef in matern kernel
# eta = 10 # range coef in matern kernel
tau = .05 # iid noise

beta1 = -alpha
beta2 = alpha
beta3 = alpha

BETA_TRUE = matrix(c(beta1, beta2, beta3), nrow = 3, ncol = 1)
THETA_TRUE = matrix(c(sigma, phi, tau), nrow = 3, ncol = 1)

Sigma = Matern_cov(sigma, phi, t)  # matern covariance

L = chol(Sigma) # lower L
x = t(L) %*% rnorm(n) # sample from zero mean random variables

H = array(c(rep(1, n), sites1v, sites2v), dim = c(n, 3)) # design matrix
mu_prior = H %*% BETA_TRUE
plotf(mu_prior, "prior mean")
mu_real = mu_prior + x
plotf(mu_real, "realisation of the grf")

# sampling from realisations
M = 200
Fmatrix = matrix(0, M, n)
ind = sample(n, size = M, replace = FALSE)
for (i in c(1:M)){
  Fmatrix[i, ind[i]] = TRUE
}
G = Fmatrix %*% H
y_sampled = Fmatrix %*% mu_real + tau * rnorm(M, 1)
x_ind = sites1v[ind]
y_ind = sites2v[ind]

plot(x_ind, y_ind, cex = abs(y_sampled), main = "Random samples in the field, circle size indicates the relative value")

```

***

## 2.2 Paramter estimation

We will now use the simulated data to estimate the model parameters $\alpha, \sigma^2, \tau^2, \phi$ using maximum likelihood estimation. Iterate between the update for the mean parameter, and updating the covariance parameters. Monitor the likelihood function at each step of the algorithm to check convergence.

The mean of the field is modelled by $p(\boldsymbol{x})$ and the imperfect information $\boldsymbol{y} = (y_1, \dots, y_m)$ conditional on $\boldsymbol{x}$ can be modelled by $p(\boldsymbol{y}|\boldsymbol{x})$, which can be expressed as follows:
    \begin{equation*}
        p(\boldsymbol{x}) = \mathop{N}(\boldsymbol{H}\beta, \boldsymbol{\Sigma}), \quad p(\boldsymbol{y}|\boldsymbol{x}) = \mathop{N}(\boldsymbol{Fx}, \boldsymbol{T})
    \end{equation*}
    Therefore, the marginal likelihood of the data is 
    \begin{equation*}
        p(\boldsymbol{y}) = \mathop{N}(\boldsymbol{G\beta}, \boldsymbol{C}), \quad \boldsymbol{G} = \boldsymbol{FH}, \quad \boldsymbol{C} = \boldsymbol{F\Sigma F^T} + \boldsymbol{T}
    \end{equation*}
    The log-likelihood as a function of $\beta$ and unknown fixed nuisance parameters $\boldsymbol{\theta}$ in the prior covariance matrix $\boldsymbol{\Sigma} = \boldsymbol{\Sigma(\boldsymbol{\theta})}$, and/or the likelihood noise matrix $\boldsymbol{T} = \boldsymbol{T}(\boldsymbol{\theta})$ becomes
    \begin{equation*}
        l(\boldsymbol{\theta}, \boldsymbol{\beta}) = -\frac{m}{2} \log(2\pi) - \frac{1}{2}\log|\boldsymbol{C}| - \frac{1}{2}(\boldsymbol{y} - \boldsymbol{G}\boldsymbol{\beta})^T\boldsymbol{C}^{-1}(\boldsymbol{y} - \boldsymbol{G\beta})
    \end{equation*}

 The MLEs of $\boldsymbol{\beta}$ and $\boldsymbol{\theta}$ are obtained by
    \begin{equation*}
        (\hat{\boldsymbol{\beta}}, \hat{\boldsymbol{\theta}}) = \underset{\boldsymbol{\beta}, \boldsymbol{\theta}}{\arg\max}\{l(\boldsymbol{\beta, \theta})\}
    \end{equation*}
    For fixed $\boldsymbol{\theta}$, the MLEs of $\boldsymbol{\beta}$ can be determined analytically.
    \begin{equation*}
        \frac{dl}{d\boldsymbol{\beta}} = \boldsymbol{G}^T\boldsymbol{C}^{-1}\boldsymbol{y} - \boldsymbol{G}^T\boldsymbol{C}^{-1}\boldsymbol{G}\boldsymbol{\beta} = \boldsymbol{0}, \quad \hat{\boldsymbol{\beta}} = (\boldsymbol{G}^T\boldsymbol{C}^{-1}\boldsymbol{G})^{-1}\boldsymbol{G}^T\boldsymbol{C}^{-1}\boldsymbol{y}
    \end{equation*}
    Whereas for fixed $\boldsymbol{\beta}$, the MLE of nuisance parameters $\boldsymbol{\theta}$ can be obtained by numerical maximization. Let $\boldsymbol{z} = \boldsymbol{y} - \boldsymbol{G\beta}$, and $\boldsymbol{Q} = \boldsymbol{C}^{-1}$. For each component of $\boldsymbol{\theta}_r$, $r = 1, \dots, d$, in this case, $\theta$ has 3 components ($\sigma^2, \eta, \tau^2$). The score of the log-likelihood becomes
    \begin{equation*}
        \frac{dl}{d\theta_r} = -\frac{1}{2}\text{trace}(\boldsymbol{Q}\frac{d\boldsymbol{C}}{d\theta_r}) + \frac{1}{2}\boldsymbol{z}^T\boldsymbol{Q}\frac{d\boldsymbol{C}}{d\theta_r}\boldsymbol{Qz}
    \end{equation*}


 The above mentioned score can be solved iteratively using Fisher scoring algorithm. To achieve the numerical stability, the expected Hessian is applied, which is 
    \begin{equation*}
        \mathop{E}(\frac{d^2l}{d\theta_rd\theta_{\bar{r}}}) = -\frac{1}{2}\text{trace}(\boldsymbol{Q}\frac{d\boldsymbol{C}}{d\theta_{\bar{r}}}\boldsymbol{Q}\frac{d\boldsymbol{C}}{d\theta_r})
    \end{equation*}
    
\begin{algorithm}[H]
     \KwData{initial $\beta_0$, $\theta_0$}
     \KwResult{Converged $\hat{\beta}$, $\hat{\theta}$}
     \While{not converged}{
      $\boldsymbol{C} = \boldsymbol{C}(\boldsymbol{\theta}^b)$\;
      $\boldsymbol{\beta}^{b+1} = [\boldsymbol{G}^T\boldsymbol{C}^{-1}\boldsymbol{G}]^{-1}\boldsymbol{G}^T\boldsymbol{C}^{-1}\boldsymbol{y}$\;
      $\boldsymbol{z} = \boldsymbol{y} - \boldsymbol{G}\boldsymbol{\beta}^{b + 1}$\;
      $\boldsymbol{Q} = \boldsymbol{C}^{-1}$ \;
      $\boldsymbol{C}^*_r = \frac{d\boldsymbol{C}(\boldsymbol{\theta}^b)}{d\theta_r}, \quad r = 1, \dots, d$ \; 
      $u_r = \frac{dl}{d\theta_r} = -\frac{1}{2}\text{trace}(\boldsymbol{Q}\boldsymbol{C}_r^*) + \frac{1}{2}\boldsymbol{z}^T\boldsymbol{Q}\boldsymbol{C}_r^*\boldsymbol{Qz}$\;
      $\boldsymbol{V}_{r\bar{r}}=\mathop{E}(\frac{d^2l}{d\theta_rd\theta_{\bar{r}}}) = -\frac{1}{2}\text{trace}(\boldsymbol{Q}\boldsymbol{C}_r^*\boldsymbol{Q}\boldsymbol{C}_{\bar{r}}^*)$\;
      $\boldsymbol{\theta}^{b + 1} = \boldsymbol{\theta}^b + \boldsymbol{V}^{-1}\boldsymbol{u}$\;
      $b = b + 1$
      }
    %  \caption{How to write algorithms}
    \end{algorithm}
    
```{r, parameter estimation}
#%%
C_matrix <- function(theta){
  sigma = theta[1]
  phi = theta[2]
  tau = theta[3]
  Sigma = Matern_cov(sigma, phi, t)
  C = Fmatrix %*% Sigma %*% t(Fmatrix) + diag(M) * tau^2
  return(C)
}


dC_dsigma <- function(theta){
  # sigma = theta$sigma
  # phi = theta$phi
  # tau = theta$tau
  sigma = theta[1]
  phi = theta[2]
  tau = theta[3]
  Km = Matern_cov(1.0, phi, t) # t here is the distance matrix, H is the design matrix, similar to X
  dC_dsgm = Fmatrix %*% Km %*% t(Fmatrix)
  return(dC_dsgm)
}

dC_dphi <- function(theta){
  sigma = theta[1]
  phi = theta[2]
  tau = theta[3]
  Kn = sigma ^ 2 * (-phi * t) * exp(-phi * t)
  return(Fmatrix %*% Kn %*% t(Fmatrix))
}

dC_dtau <- function(theta){
  return(diag(M))
}

# Use fisher scoring to find MLE parameters
# beta = np.zeros([3, 1])
beta = matrix(c(-2.1, 3.1, .9), nrow = 3, ncol = 1)
theta = rbind(.245, 9.3, .003)
MAX_ITER = 5
No_iter = 0
epsilon = 10
Beta = matrix(0, nrow = MAX_ITER, ncol = 3)
Likelihood = matrix(0, nrow = MAX_ITER, ncol = 1)

while (No_iter < MAX_ITER & epsilon > .0001){
    C = C_matrix(theta)
    beta = solve(t(G) %*% solve(C, G), t(G) %*% solve(C, y_sampled))
    # beta = np.linalg.solve(np.dot(G.T, np.linalg.solve(C, G)), np.dot(G.T, np.linalg.solve(C, y_sampled)))
    Beta[No_iter, ] = t(beta)
    z = y_sampled - G %*% beta
    lik = -M/2 * log(2 * pi) - 1/2 * log(det(C)) -  1/2 * t(z) %*% solve(C, z) # otherwise, it becomes inf
    Likelihood[No_iter, ] = lik

    # Find dC*/dtheta
    dC_dSgm = dC_dsigma(theta)
    dC_dPhi = dC_dphi(theta)
    dC_dTau = dC_dtau(theta)

    u_sigma = -1/2 * sum(diag(solve(C, dC_dSgm))) + 1/2 * t(z) %*% solve(C, dC_dSgm %*% solve(C, z))
    u_eta = -1 / 2 * sum(diag(solve(C, dC_dPhi))) + 1 / 2 * t(z) %*% solve(C, dC_dPhi %*% solve(C, z))
    u_tau = -1 / 2 * sum(diag(solve(C, dC_dTau))) + 1 / 2 * t(z) %*% solve(C, dC_dTau %*% solve(C, z))

    u = rbind(u_sigma, u_eta, u_tau)

    V11 = -1/2 * sum(diag(solve(C, (dC_dSgm %*% solve(C, dC_dSgm)))))
    V12 = -1 / 2 * sum(diag(solve(C, (dC_dSgm %*% solve(C, dC_dPhi)))))
    V13 = -1 / 2 * sum(diag(solve(C, (dC_dSgm %*% solve(C, dC_dTau)))))
    V21 = -1 / 2 * sum(diag(solve(C, (dC_dPhi %*% solve(C, dC_dSgm)))))
    V22 = -1 / 2 * sum(diag(solve(C, (dC_dPhi %*% solve(C, dC_dPhi)))))
    V23 = -1 / 2 * sum(diag(solve(C, (dC_dPhi %*% solve(C, dC_dTau)))))
    V31 = -1 / 2 * sum(diag(solve(C, (dC_dTau %*% solve(C, dC_dSgm)))))
    V32 = -1 / 2 * sum(diag(solve(C, (dC_dTau %*% solve(C, dC_dPhi)))))
    V33 = -1 / 2 * sum(diag(solve(C, (dC_dTau %*% solve(C, dC_dTau)))))

    V = matrix(c(V11, V12, V13, V21, V22, V23, V31, V32, V33), nrow = 3, ncol = 3)
    # print(lik)
    # print(No_iter)
    theta_new = theta - solve(V, u)  # here it is minus, but in the book, it says plus, needs to be rechecked
    epsilon = norm(theta_new - theta, type = "2") / norm(beta, type = "2")
    theta = theta_new
    print(paste(epsilon , " , iter no is ", No_iter))
    No_iter = No_iter + 1
}
# print(beta)
# print(BETA_TRUE)
# print(theta)
# print(THETA_TRUE)

plot(c(1:No_iter), Likelihood, main = "likelihood function")

# alphah = (sum(np.abs(beta)) / 3).squeeze()
thetah = theta
sigmah = theta[1]
phih = theta[2]
tauh = theta[3]
betah = beta
beta1 = beta[1]
beta2 = beta[2]
beta3 = beta[3]

# print('Estimated sigma is ', sigmah, "\nEstimated eta is ", etah, \
#       "\nEstimated tau is ", tauh, "\nEstimated alpha is ", alphah)
print(cat("\nEstimated sigma is ", round(sigmah, digits = 3), "; True sigma is ", THETA_TRUE[1], 
            "\nEstimated phi is ", round(phih, digits = 2), "; True phi is ", THETA_TRUE[2], 
            "\nEstimated tau is ", round(tauh, digits = 5), "; True tau is ", THETA_TRUE[3], 
            "\nEstimated beta1 is ", round(beta1, digits = 2), "; True beta1 is ", BETA_TRUE[1], 
            "\nEstimated beta2 is ", round(beta2, digits = 2), "; True beta2 is ", BETA_TRUE[2], 
            "\nEstimated beta3 is ", round(beta3, digits = 2), "; True beta3 is ", BETA_TRUE[3]))


```


*** 

## 2.3 Kriging

We will now use the estimated model parameters to perform kriging prediction. Predict variables $x(s)$, where predictions sites lie on a regular grid of size 25x25 for the unit square. Visualize the Kriging surface and the prediction standard error. Compare with the true field.

```{r, kriging}
Sigmah = Matern_cov(sigmah, phih, t) # estimated covariance matrix
Lh = chol(Sigmah)
mh = H %*% betah + Lh %*% rnorm(n)
Ch = C_matrix(thetah)
xp = mh + Sigmah %*% t(Fmatrix) %*% solve(C, (y_sampled - Fmatrix %*% mh))
plotf(xp, "posterior mean")
Sigmap = Sigmah - Sigmah %*% t(Fmatrix) %*% solve(Ch, Fmatrix %*% Sigmah)
# plotf(Sigmap, "posterior covariance")
estd = sqrt(diag(Sigmap))
plotf(estd, "posterior std")
MSE = sqrt(sum(abs(xp - mu_real) ** 2) / n)
print(paste("The prediction error is ", MSE))
```

***

# Part III Integrated nested Laplace Approximations (INLA)

## 3.1 Simple Linear Regression

In the last part of this exercise we explore the R-INLA package.

```{r INLA1, echo=FALSE}
# 1 Load R-INLA package
library(INLA)

# 2 Load and save SkiPump-dataset
skiData = read.table("SkiJump.txt", header=TRUE)
```

The ski jumping data set contains 26 observations of measured lengths in ski jumping competitions (in meters) between 1961 and 2011. 

```{r INLA2, echo=FALSE, fig.cap="\\label{fig:skiData}Visualisation of the ski jumping data set"}
# 3 Visualisation
plot(skiData$Year, skiData$Length,
     xlab="year", ylab="length",
     main="Ski jumping data")
```

In Figure \ref{fig:skiData} we observe a clear (almost linear) trend in the measured jumping lengths to increase with the years.

Therefore, we use linear approach for the statistical modelling of this data, where the years $x_i$ are the covariates and the lengths $y_i$ are the responses for $i=1,\dots,26$:
\begin{equation*}
\mathbb{E}[y_i] = \mu + \beta x_i,\quad \mathbb{Var}[y_i]=\tau^{-1}.
\end{equation*}

This can be posed as a latent Gaussian model suited for the INLA framework.

1.    The response depends on the linear predictors $\eta$ as $y|x,\theta = \Pi \pi(y_i|\eta_i,\tau)$ where $\pi(y_i|\eta_i,\tau)\sim\mathcal{N}(\eta_i,\tau^{-2})$ 
2.    The parameter of the linear predictor $eta_i=\mu+x_i\beta$ are independent Gaussian with some variance and mean zero. 
3.    The hyperparameter $\tau$ is only one-dimensional. 

```{r INLA3, echo=FALSE, fig.width=12, fig.height=4, fig.cap="\\label{fig:INLA}Posterior marginals for selected effects"}
# 4 Linear regression using R-INLA
res = inla(Length ~ Year, data=skiData)

# Post processing
summary(res)

par(mfrow=c(1,3))

plot(inla.smarginal(res$marginals.fixed[[1]]),
     xlab="Intercept", ylab="density",
     main="Posterior marginal for fixed effect")

plot(inla.smarginal(res$marginals.hyperpar[[1]]),
     xlab="tau", ylab="density",
     main="Posterior for the hyperparameter tau")

plot(inla.smarginal(
        inla.tmarginal(function(x)1/sqrt(x), 
            res$marginals.hyperpar[[1]])), 
     xlab="sigma", ylab="density",
     main="Posterior marginal for transformed hyperparameter sigma")

# FB: Why is marginals.random empty?? = beta??

inla.zmarginal(inla.tmarginal(function(x)1/sqrt(x),res$marginals.hyperpar[[1]]))
```

FB: What does all this mean?.....


## 3.2 GLMM with random effects

Last, we use INLA to analyse the "Seeds" data set.

```{r GLMM1, echo=FALSE}
library(INLA)
# Load dataset
data(Seeds)

# Prepare INLA call
data = Seeds
formula = r ~ x1 + x2 + f(plate, model="iid")

# Call INLA
res = inla(formula=formula, data=data,
           family="binomial", Ntrials=n,
           control.family = list(link = "logit"), 
           control.predictor = list(compute=TRUE))

# Post processing
summary(res)

par(mfrow=c(1,3))
plot(inla.smarginal(res$marginals.fixed[[1]]),
     xlab="Intercept", ylab="density",
     main="Posterior marginal for fixed effect")

plot(inla.smarginal(res$marginals.fixed[[2]]),
     xlab="a1", ylab="density",
     main="Posterior marginal for fixed effect")

plot(inla.smarginal(res$marginals.fixed[[3]]),
     xlab="a2", ylab="density",
     main="Posterior marginal for fixed effect")


# FB: how to get p_i? Or what is of interest here?
par(mfrow=c(1,1))
plot(res$summary.fitted.values$mean,
     xlab="plate", ylab="p",
     main="Posterior marginal mean for p")

```